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ABSTRACT 

Many RNA molecules undergo complex maturation, 
involving e.g. excision from primary transcripts, 
removal of introns, post-transcriptional modification 
and polyadenylation. The level of mature, functional 
RNAs in the cell is controlled not only by the synthe- 
sis and maturation but also by degradation, which 
proceeds via many different routes. The system- 
atization of data about RNA metabolic pathways 
and enzymes taking part in RNA maturation and 
degradation is essential for the full understanding 
of these processes. RNApathwaysDB, available 
online at http://iimcb.genesilico.pl/rnapathwaysdb, 
is an online resource about maturation and decay 
pathways involving RNA as the substrate. The 
current release presents information about reac- 
tions and enzymes that take part in the maturation 
and degradation of tRNA, rRNA and mRNA, and de- 
scribes pathways in three model organisms: 
Escherichia coli, Saccharomyces cerevisiae and 
Homo sapiens. RNApathwaysDB can be queried 
with keywords, and sequences of protein enzymes 
involved in RNA processing can be searched with 
BLAST. Options for data presentation include 
pathway graphs and tables with enzymes and litera- 
ture data. Structures of macromolecular complexes 
involving RNA and proteins that act on it are pre- 
sented as 'potato models' using DrawBioPath— a 
new javascript tool. 

INTRODUCTION 

RNA performs many essential roles in living cells, 
including the transmission of genetic information 



between DNA and proteins, regulation of many processes 
and catalysis. In each of these cases, the function of the 
RNA depends on its nucleotide sequence. The sequence of 
the mature functional RNA is quite often different from 
that of the primary transcript, due to a combination of 
various processing events such as excision of functional 
units from the precursor molecule, removal of introns 
(splicing), joining of different units (trans-splicing), 
addition of nucleoside residues absent from the template 
(by capping, editing or polyadenylation) and changing the 
chemical identity of individual residues (by editing or 
modification) [reviews: (1-5)]. The function of RNA 
depends also on its locahzation, and formation of higher 
order structures and complexes with other molecules, in 
particular proteins and other RNAs (6-9). 

Functional RNA molecules that are no longer needed 
or exhibit flaws due to damage or improper processing, 
folding or assembly into functional complexes, are 
eliminated from the ceU by degradation, which can also 
occur by many different pathways (10-12). RNA decay 
removes the by-products of gene expression, including 
excised introns and other RNA pieces released during 
RNA processing. Finally, RNA degradation pathways 
eliminate intergenic, intragenic, promoter-associated and 
antisense RNAs that arise either as regulatory RNAs or 
transcriptional noise. The efficiency and specificity of 
RNA degradation is ensured primarily by a broad 
spectrum of various endo- and exoribonucleases (1,13) 
but also is dependent on the formation of specific struc- 
tures by the RNAs and RNA-protein complexes involving 
regulatory factors (14) and can be regulated, thus 
providing means to regulate gene expression and protein 
levels. 

The biogenesis of functional RNAs as well as RNA 
decay in both eukaryotic and prokaryotic cells involves 
a series of chemical, structural and spatial alterations, in 
which RNA interacts with various cellular factors, 
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in particular with enzymes that catalyse reactions with 
RNA molecules as substrates and products (15). It has 
to be emphasized that RNA processing pathways 
interact with each other and with other cellular processes, 
e.g. they can share some steps and/or proteins involved 
(1,16). One RNA molecule may be also subject to 
several enzymatic processes at the same time. Therefore, 
knowledge of both the entire network of RNA processing 
events and proteins responsible for individual transform- 
ations is critical for our understanding of RNA metabol- 
ism. Many of the proteins involved in RNA processing are 
very well described (with quantitative and qualitative 
information regarding substrate specificity, kinetics, mech- 
anisms of action and their 3D structure). However, there 
are still processes, for which an enzymatic activity is 
known or suspected to exist, but the genes/proteins/ 
enzymes have not yet been characterized. The reconstruc- 
tion of RNA metabolic pathways and networks can 
therefore highhght the elements of the system that 
evidently require additional information. The comparison 
of metabolic pathways between different species can in 
turn suggest homologous proteins that may be involved 
in similar activities. 

The information on RNA processing events is scattered 
in the hterature and thus far there has been no database 
dedicated to the storage and presentation of these data. To 
this end, we have developed the RNApathwaysDB 
(a database of RNA maturation and decay pathways) as 
a 'one stop shop' for information about RNA processing 
pathways and proteins involved in RNA metabolism from 
model organisms. While the first release is limited to 
mRNA, tRNA and rRNA metabolism in just three 
model organisms, we intend to expand it to include 
other RNAs and species. 



DATABASE CONTENT 

Based on a comprehensive survey of literature and infor- 
mation available in general-purpose databases, we have 
compiled the following datasets: 

• mRNA, tRNA and rRNA maturation and degrad- 
ation pathways comprising RNA states and intermedi- 
ates of processes, connected by particular 
transformations, such as reactions catalysed by 
enzymes, binding or release of components, or well- 
defined, functionally important conformational 
changes. 

• Proteins, enzymatic complexes and catalytic RNA 
molecules together with known structures involved in 
the above-mentioned transformations. 

All data items have been curated manually and 
whenever possible and reasonable we provided references 
to the published experimental reports and/or to 
other databases. In particular we hnked the entries 
in RNApathwaysDB to such databases as KEGG 
(http://www.genome.jp/kegg/) (17) or REACTOME 
(http://www.reactome.org) (18) in case of pathways or 
UniProt (http://www.uniprot.org/) (19), NCBI databases 
(20), BRENDA (http://www.brenda-enzymes.info) (21), 



PFAM (http://pfam.sanger.ac.uk/) (22) and InterPro 
(http://www.ebi.ac.uk/interpro/) (23) in case of proteins. 
RNA molecules are linked to the REAM database (http:// 
rfam.sanger.ac.uk/) (24), and macromolecular structures 
are hnked to the Protein Data Bank (http://www.pdb 
•org) (25). 



DATABASE ORGANIZATION AND ACCESS 

RNApathwaysDB is a relational database linking datasets 
mentioned above, and can be queried via six menus, 
'PATHWAYS', PROTEINS', 'CATALYTIC RNA 
MOLECULES', 'ENZYMATIC COMPLEXES', 
'STRUCTURES' and 'PUBLICATIONS' (Figure 1). 

PATHWAYS: This menu provides access to data on 
different processes (pathways) involving mRNA, tRNA 
and rRNA, classified as maturation (from a primary tran- 
script to a mature form) or degradation (decomposition of 
the mature form) for a set of model organisms. As of 
25 September 2012 there are altogether 33 pathways for 
3 model organisms: Escherichia coli, Saccharomyces 
cerevisiae and Homo sapiens. All pathways are visualized 
as graphs created with PyGraphviz (http://networkx.lanl. 
go v/py graph viz). RNA states are represented as graph 
nodes whereas transitions between them (e.g. enzymatic 
reactions) are visualized as edges, shown as a directed 
arrow that hnks individual states. Each node and each 
edge are hyperlinked to static pages displaying basic infor- 
mation about stages of RNA processing and reactions 
leading from one stage to another. Pages comprise 
pictures and detailed information about proteins, enzym- 
atic complexes and RNA molecules taking part in 
a given process. All types of data are linked to relevant 
pubhcations, if available. As of 25 September 2012, 
there are 289 RNA states and 294 reactions in 
RNApathwaysDB. 

PROTEINS: As of 25 September 2012, 
RNApathwaysDB stores information about 17, 85 and 
128 proteins from E. coli, S. cerevisiae and H. sapiens, 
respectively. This dataset does not cover all proteins 
involved in RNA metabolism in these species, and is 
limited to proteins that are well-characterized and whose 
role in RNA metabohsni or RNA degradation can be 
clearly defined and connected to one or more distinct 
steps in enzymatic transformation of RNAs that are 
covered by RNApathwaysDB. All proteins are linked to 
the genes that encode them. Data that have been collected 
is comprised of names of genes and proteins together with 
their sequences from the NCBI databases, 3D structures 
from the PDB and other details relevant for a particular 
entry, based on information from other databases (e.g. 
the type of enzymatic activity, the presence of isoforms, 
cellular/tissue and subcellular localization, together 
with links to the relevant database entries). For the 
eukaryotic mRNA maturation pathways involving 
splicing by the spliceosome (a large multi-component 
complex) we have not hsted all individual parts of the 
system, but provided links to the Spliceosome Data- 
base (http://spliceosomedb.ucsc.edu) and to SpliProt3D 
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Figure 1. Content of the RNApathwaysDB. 



(http://iimcb.genesilico.pl/SpliProt3D/) (26). Links to 
selected relevant publications are also provided. 

CATALYTIC RNA MOLECULES: This section stores 
data about RNA molecules that are catalytically active and 
can act either separately or as elements of larger complexes. 
For each RNA molecule data such as molecule name, nu- 
cleotide sequence and a hnk to the homologous family in 
the REAM database are provided, if available. 

ENZYMATIC COMPLEXES: Currently, 

RNApathwaysDB stores information about 37 enzymatic 
complexes: 2 from E. coli, 25 from H. sapiens and 10 from 
S. cerevisiae. For each complex, RNApathwaysDB 
provides an overall description, a simplified 2D diagram 
(a 'potato model') and Hnks to selected relevant publica- 
tions, if available. 

STRUCTURES: For those proteins present in 
RNApathwaysDB with NMR or X-ray structures avail- 
able, atomic coordinates from the PDB are made avail- 
able. Currently RNApathwaysDB includes 216 structural 
models. 

PUBLICATIONS: Literature references to entries in 
PubMed (20) have been compiled into an additional 
dataset, currently comprising 1541 positions. 

The full catalogue of data about RNA maturation and 
degradation pathways is a moving target. The current 



Hstings in RNApathwaysDB should be considered provi- 
sional and would not be fully complete for some time. We 
intend to update the information on RNA maturation and 
degradation pathways with information from new 
discoveries, e.g. new enzymatic steps that will be identified 
in the future. We also encourage the users to report any 
known reactions and enzymes that may be missing in 
RNApathwaysDB. In the future, we plan to expand the 
repertoire of pathways to include other types of RNAs, in 
particular smaU non-coding RNAs of various types. We 
also intend to extend the dataset to encompass additional 
model organisms (those with well-characterized RNA me- 
tabolism), e.g. representatives of Gram-positive bacteria 
and plants. Along with the expansion of the pathways 
we wiU also expand the associated datasets of proteins, 
RNAs and complexes. Therefore, we invite experts inter- 
ested in inclusion or refinement of information for particu- 
lar systems in RNApathwaysDB for collaboration on data 
gathering and curation. 

IMPLEMENTATION 

RNApathwaysDB has been implemented using the 
Django Web Framework (http://www.djangoproject 
.com/), and uses a MySQL relational database to store 
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data. The features provided by the webpage are: user 
profile and a login, search tool with the possibility to 
search using a keyword or protein sequence, wiki-hke 
pages, which enable editing of entries, and a link to a 
new javascript image drawing tool. 

To view the content of the database there is no need to 
log in or register. However, editing of the content of the 
database requires registration. Logging in not only 
uncovers the wiki-hke pages of aU the entries but also 
gives access to the otherwise hidden administration site, 
which provides tools to add, edit or delete information. 
Users can also add comments and new references to 
existing records. 

DrawBioPath is a new image drawing tool (accessible 
via the 'draw a picture' hnk in the main menu which 
redirects to http://drawbiopath.genesilico.pl/) that has 
been developed for creating graphical representations of 
metabolic pathways of DNA and RNA, and was used for 
creating all images in RNApathwaysDB. The tool is 
written in JavaScript and is based on the SVG-edit web 
editor (http://code.google.eom/p/svg-edit/). The drawing 
engine uses the SVG format provided by the W3C con- 
sortium, which enables resizing the images without the 
loss of quality and makes it possible to modify the 
images with external tools for processing vector 
graphics, e.g. Inkscape or others. In contrast to the 
drawing tool available in REPAIRtoire (27), another 
database of nucleic acid metabohsm, DrawBioPath is 
more comfortable and easier to use. DrawBioPath 
provides a hbrary of shapes that can be used for 
drawing RNA and DNA molecules, proteins or irregular 
objects. Our tool provides a utility to upload a user's 
graphics file in SVG format and to modify it using 
either a graphical interface or a text editor. 

RNApathwaysDB can be queried in two ways: using a 
keyword or a protein sequence. Keyword search is a 
simple text search tool available in the main menu. It 
returns a structured hst of the database entries that 
contain the query, with the part of the text corresponding 
to the query highhghted in red. Protein sequence search is 
available via the 'SEARCH' link in the main menu. A 
search using protein sequence is performed with BLAST 
program. There is also a utihty that sends a protein 
sequence from a RNApathwaysDB protein entry to 
BLAST on the NCBI webserver. 



DISCUSSION 

The metabohsm of nucleic acids is a very old and extensive 
field of research. Networks of relationships and inter- 
actions in RNA metabolic pathways are extremely 
complex and the amount of data is huge, however differ- 
ent pieces of information are scattered around different 
resources, in hterature and in various databases. 
Information about RNA processing pathways and their 
components is available to some extent in general-purpose 
pathway databases such as KEGG, REACTOME, 
BioCyc (http://biocyc.org) (28) or WikiPathways (http:// 
www.wikipatliways.org/) (29). However, none of these 
databases contain a complete description of RNA 



maturation and degradation pathways. Thus, the vast 
quantity of data and the lack of a single bioinformatics 
resource dedicated to RNA processing and degradation 
prompted us to develop resource database with a WWW 
server that does not only gather the information dedicated 
to RNA processing pathways but also helps to systematize 
the knowledge and makes it easily accessible for lay users. 

RNApathwaysDB contains more detailed information 
than the databases mentioned above including informa- 
tion on various aspects of RNA maturation, such as 
mRNA capping, splicing and polyadenylation, tRNA bio- 
synthesis and rRNA maturation. The database also 
describes RNA degradation pathways, such as rapid 
tRNA decay, tRNA nuclear surveillance, mRNA decay 
pathways and rRNA quality control pathways. An 
important component of RNApathwaysDB is the 
dataset of enzymatic complexes together with their macro- 
molecular structures visualized as 'potato models' that can 
be used for illustration e.g. in teaching. To our knowledge, 
there is no other resource available that provides such a 
range of information on RNA metabohsm. 

In the development of RNApathwaysDB we used our 
experience from the work on the MODOMICS database 
of RNA modification pathways (30) and from the more 
recent work on the REPAIRtoire database of DNA repair 
pathways (27). We hope that the RNApathways database 
will become as popular and useful for the community of 
researchers working on RNA processing pathways, as 
MODOMICS has become in the RNA modification 
field. In the future, we plan to integrate the databases on 
different aspects of RNA metabolism using a common 
data model and a joint interface. Another envisaged dir- 
ection of development is to link generic RNA representa- 
tions in RNApathwaysDB to particular mature RNA 
sequences that will be stored in the future RNAcentral 
database (31) and to prokaryotic and eukaryotic genome 
sequence databases, so e.g. cleavage sites in maturation 
reactions could be mapped onto the precursor RNA 
molecules. 



AVAILABILITY 

The content of the RNApathways database is freely 
available at the URL http://iimcb.genesilico.pl/ 
rnapathwaysdb/. The software for generation of custom 
images for biological processes can be accessed at the 
URL http://drawbiopath.genesilico.pl/. Researchers inter- 
ested in adding or curating data (proteins, features, 
complexes, pathways, etc.) or in implementing options 
that are not yet available are encouraged to contact 
J.M.B. (at iamb@genesilico.pl). 
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